Classification of Web Documents using Fuzzy Logic Categorical Data Clustering
نویسندگان
چکیده
We propose a categorical data fuzzy clustering algorithm to classify web documents. We extract a number of words for each thematic area (category) and then, we treat each word as a multidimensional categorical data vector. For each category, we use the algorithm to partition the available words into a number of clusters, where the center of each cluster corresponds to a word. To calculate the dissimilarity measure between two words we use the Hamming distance. Then, the classification of a new document is accomplished in two steps. Firstly, we estimate the minimum distance between this document and all the cluster centers of each category. Secondly, we select the smallest of the above minimum distance and we classify the document in the category that corresponds to this distance.
منابع مشابه
Web Document Clustering Using Fuzzy Equivalence Relations
Conventional clustering means classifying the given data objects as exclusive subsets (clusters).That means we can discriminate clearly whether an object belongs to a cluster or not. However such a partition is insufficient to represent many real situations. Therefore a fuzzy clustering method is offered to construct clusters with uncertain boundaries and allows that one object belongs to overl...
متن کاملA New Approach to Classify Text based on CosFuzzy Logic
Objective type of Examination evaluation is easy in Computer world. But the descriptive type of question evaluation is more difficult and there is no significant research has been taken place. In this paper I propose a new solution to the above problem with text classification using the new fuzzy logic named CosFuzzy Logic. Document Clustering is a useful technique that organizes a large quanti...
متن کاملOptimization of a Search Engine for an Organized and Effective Browsing
In web search applications, queries are submitted to search engines to represent the information needs of users. Discovering the number of diverse user search goals for a query and depicting each goal with some keywords automatically. In the existing work propose a novel approach to infer user search goals by analyzing search engine query logs. First propose a novel approach to infer user searc...
متن کاملModified Particle Swarm Optimization Based Adaptive Fuzzy K-Modes Clustering for Heterogeneous Medical Databases
The main purpose of data mining is to extract hidden predictive knowledge of useful information and patterns of data from large databases for utilizing it in decision support. Medical field has large amount of various heterogeneous databases, in which the extraction of hidden useful knowledge for the classification of data is difficult one. In order to cluster and classify the whole databases o...
متن کاملUsing Fuzzy Logic Clustering Discover Semantic Similarity in Web Document
The complex and high interactions between terms in documents demonstrates vague and ambiguous meanings. There exist complicated associations within one web document and linking to the others. Most of these approaches perform similarity and feature section methods. There is need of complex document clustering and produced meaningful document. This paper proposed methodology is capable of handles...
متن کامل